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ABSTRACT 

The limiting distribution of the normalized number of comparisons used by Quick- 
sort to sort an array of n numbers is known to be the unique fixed point with zero mean 
of a certain distributional transformation S. We study the convergence to the limiting 
distribution of the sequence of distributions obtained by iterating the transformation S, 
beginning with a (nearly) arbitrary starting distribution. We demonstrate geometri- 
cally fast convergence for various metrics and discuss some implications for numerical 
calculations of the limiting Quicksort distribution. Finally, we give companion lower 
bounds which show that the convergence is not faster than geometric. 
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1 Introduction and summary 

The Quicksort algorithm of Hoare M is "one of the fastest, the best-known, the most 
generalized, . . . and the most widely used algorithms for sorting an array of numbers" pj. 
Quicksort is the standard sorting procedure in Unix systems, and in a special issue of 
Computing in Science & Engineering, guest editors Jack Dongarra and Francis Sullivan 



(HI; see also [1C]) chose Quicksort as one of the ten algorithms "with the greatest influ- 
ence on the development and practice of science and engineering in the 20th century." 
Our goal in this introductory section is to review briefly some of what is known about 
the analysis of Quicksort and to summarize how this paper advances that analysis. 

The Quicksort algorithm for sorting an array of n numbers is extremely simple to 
describe. If n = or n = 1, there is nothing to do. If n > 2, pick a number uniformly 
at random from the given array. Compare the other numbers to it to partition the 
remaining numbers into two subarrays. Then recursively invoke Quicksort on each of 
the two subarrays. 

Let X n denote the (random) number of comparisons required (so that X$ = 0). 
Then X n satisfies the distributional recurrence relation 

X n ^ X Un - X + X*_ Un + n - 1, n > 1, 

where = denotes equality in law (i.e., in distribution), and where, on the right, U n is 
distributed uniformly on the set {1, . . . , n}, X*- = Xj, and 

U n ) Xq, . . . , X n _i; Xq,... ,X n _± 

are all independent. 

As is well known and quite easily established, for n > we have 

fi n :=EI n = 2(n + l)H n — An ~ 2nlnn, 

where H n := 5Zfc=i ^ s ^ ne n ^ harmonic number and ~ denotes asymptotic equiva- 
lence. It is also routine to compute explicitly the standard deviation of X n (see Exercise 

6.2.2-8 in ]l2|), which turns out to be ~ n \J^ ~ I 71 " 2 - 
Consider the normalized variate 

Y n := (X n - nn)/n, n > 1. (1.1) 

Regnier [|14|] showed using martingale arguments that Y n — » Y in distribution, with Y 
satisfying the distributional identity 

Y = UY + (1 — U)Y* + g(U) =:h Y , Y *(U), (1.2) 

where 



g(u) := 2nlnn + 2(l - u) ln(l - u) + 1, 



(1.3) 
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and where, on the right of = in (1.2), U, Y, and Y* are independent, with Y* = Y and 
U ~ unif(0, 1). Rosier |l5| showed that ( |1.2[ ) characterizes the limiting law C(Y), in the 
precise sense that F := C(Y) is the unique fixed point of the operator 

G = C(V) h+ SG := £([/F + (1 - U)V* + 5(C/)) (1.4) 

(in what should now be obvious notation) subject to 

EF = 0, VarV<oo. 

[The fixed points of G with finite mean are the translates CiY + c) with c constant, but 
there are other fixed points without mean; see @ for a complete characterization.] 

Rosier [15] showed that the moment generating function of the limiting distribution 
jC(Y) is everywhere finite. We have studied the limiting distribution further in ||, 
showing that C(Y) has a density / which is infinitely differentiable, and that each 
derivative f^ k \y) is bounded and decays as y — > ±oo more rapidly than any power 



of (This improves an earlier result by Tan and Hadjicostas [18].) 

The purpose of the present paper is to study the convergence to the limiting distri- 
bution C(Y) of the sequence of distributions obtained by iterating Rosler's operator S 
in ( |l,4j ), beginning with a (nearly) arbitrary starting distribution. To fix notation, we 
let Zq be an arbitrary random variable, and Fq := C(Zq) its distribution. We define, 
for n > 1, 

with Z* n _x =Z n -i and Z n _\, Z n _ x , and U independent; in other words, 

F n := C(Z n ) = S n F , n>0. 

Let ||X|| 2 := (EX 2 ) 1 / 2 denote the L 2 -norm, and let c?2 denote the metric on the space 
of probability distributions with finite variance defined by 

d 2 (F,G):=mm\\X-Y\\ 2 , (1.5) 

taking the minimum over all pairs of random variables X and Y (defined on the same 
probability space) with C(X) = F and C(Y) = G. Note that, using the coupling with 
X and Y independent, for any F and G each with zero mean and finite variance, 

d 2 (F,G) < (EI 2 + E7 2 ) 1/2 < ||X|| 2 + ||y|| 2 , (1.6) 

when C(X) = F and C{Y) = G. Rosier [ fl5| l showed that if Zq has mean and finite 
variance, then F n — > F in the ^-distance with a geometric rate: 

d 2 (F n ,F) < (2/3) n / 2 d 2 (F ,F) < (2/3)™/ 2 (Var Z + a 2 ) 1 / 2 , (1.7) 

where 



a 2 := Vary = 7 - |vr 2 = 0.42. 



(1.8) 
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Our main interest is to show similar estimates for other measures of the distance between 
F n and F. 

We will show in Section H using estimates of the characteristic functions given in 
H and Section that the distribution F n has a bounded, continuous density function 
f n , at least as soon as n > 3, and that, if Zq has mean and finite variance, then f n 
converges uniformly to /, with a geometric rate of convergence, as n — > oo. We further 
show geometrically fast convergence in the total variation and Kolmogorov-Smirnov 
distances, too. 

In Section || we give bounds for the moment generating functions of Y and of Z n . In 
Section || we show that if Zq has mean and a finite moment generating function ipo, 
then the moment generating function if) n of F n is finite and converges uniformly on 
compact intervals to the moment generating function of Y, again with a geometric rate 
of convergence. We study in particular the cases Zq = and Zq normally distributed 
with zero mean and sufficiently large variance; it turns out that in these cases V'n(A) 
converges monotonically. 

In Section [f| we discuss some implications for numerical calculations of the limiting 
Quicksort distribution F, showing how explicit and arbitrarily small error bounds can 
be obtained. 

Finally, in Sections [?]-|8] we give some companion lower bounds, showing that the 
convergence is not faster than geometrical for several different metrics. We also show 
geometrically fast convergence in the d p metric for any finite p. 

Remark 1.1. The mode and rate of convergence of the distribution of the actual nor- 
malized Quicksort variables Y n of ( |1 . 1[ ) to the limit F is a quite different matter, which 
will be studied in another paper ||. 



2 Bounds on the characteristic functions 

In H we gave bounds on the characteristic function of Y. The same method yields, 
more generally, bounds on the characteristic function of Z n for arbitrary Zq. We write 
(fixit) '■= Ee*'^ for any random variable X. 

Theorem 2.1. For every real p > there is a constant < c p < oo such that for any 
Zq and any n > p + 1, the characteristic function (t>z n (f) satisfies 

|0Z„(*)| < c p \t\~ p for all t £ R. (2.1) 



The best possible constants c p satisfy cq = 1, ci/ 2 < 2, c 3 / 4 < v8vr, c\ < 47r, c 3 / 2 < 187, 
c 5/2 < 103215, c 7/2 < 197102280, and the relation 

c P+ i < 2P +1 cl^ 1 Mp/(p - 1), p > 1; (2.2) 



moreover, at least if we restrict (2.1) to n > p + 2, 

Cp < 2 p2+6 f , p > 0. (2.3) 
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[The bounds on the constants c p obtained here are the same as for the special case 

Zq = Y (whence Z n = Y for every n) in ||. However, there is no reason to believe that 
our method yields the best possible bounds, and the best constants for the special case 
in H may be smaller than the best constants in Theorem |2.1| here.] 



Proof. The proof is almost identical to the proof of the special case in ||, so we will 

omit some details. For any random variable Z, we abuse notation slightly and denote 

by SZ the random variable h ZtZ *(U) = UZ + (1 - U)Z* + g(U) where U, Z, and Z* 

c 

are independent, with Z* = Z and U ~ unif(0, 1); thus SZ is a random variable with 
the distribution SC{Z). By conditioning on U, we obtain the fundamental relation 



Asz(t) = I (t> z {ut) Z ((1 - u)t) e^W du, 
and thus the estimate 



t € R, 



\<f>sz(t)\ < [ \<j>z(ut)\ \4 z ((l-u)t)\ du. 
Jo 



(2.4) 



(2.5) 



To complete the proof, we give a series of lemmas. 

Lemma 2.2. For any real numbers y and z, the random variable hy )Z (U) defined by 
(|1.2|) satisfies 

\^ e ith y , x (U)i < 2|tr 1 / 2 . 



Proof. This follows by a method of van der Corput [|2|, |13|, g|, using little more than the 
fact that hy )Z is convex with hy z > 8 on (0, 1). □ 

Lemma 2.3. For any random variable Z and real t, we have \(f>sz(t)\ < 2|t|" 1 / 2 . 
Proof. Lemma |2.2| yields 



\4sz{t)\ 



E e ith z,z*{U) 



< E 



Z, z* 



< 2 i 



-1/2 



□ 



Returning to our sequence (Z n ), the preceding lemma applies to all elements except 
Z , i.e., 



|0z n (*)| < 2I*!- 1 / 2 , n>l, 



(2.6) 



which yields the case p = 1/2 of Theorem 2.1. We improve the exponent by induction, 
using (^5|) . 

Lemma 2.4. lei < p < I. If \<t>z{t)\ < c p \t\- p , t £ R, t/ien 



6sz(t)| < 



[r(i-p)] : 
r(2-2 P ) ^ p 



ait 



-2p 
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Proof. By (ETq) and the hypothesis, 
■i 



Wsz{t)\< / c 2 p \ut\~ p \(l - u)t\~P du = c 2 p \t\~ 2p u-P(l-u)- p du, 
Jo Jo 

and the result follows by evaluating the beta integral. □ 



In particular, using ( |2.6[) , Lemma 2A yields 

\(f>z n (t)\ <47r|t|- 1 , n>2. (2.7) 



This proves ( |2.1[) for p = 1, with ci < 4-7T. Since |<Az n (t)| < 1) f° r an Y p < 1 we trivially 
have |^z„(<)| < l<fe„ WI P , which by (g]7|) establishes flO]) for all p < 1 with c p < (4vr)P; 
applying Lemma |2.4| again, we obtain (|2.l| ) for all p < 2. Somewhat better numerical 
bounds are obtained for l/2<p<lby taking a geometric average between the cases 
p = 1/2 and p = 1; this yields c p < 2 2?l 7r 2p_1 , 1/2 < p < 1. In particular, we have 
c 3 / 4 < V^tt, and thus, by Lemma [D|, c 3/2 < 87T 1 / 2 [r(l/4)] 2 < 186.4 < 187. 

Lemma 2.5. Ze£ p > 1. Z/|0z(t)| < c p |t|~ p , tgR, i/ien 

|0sz(<)| <2P +1 C i + ( 1 / P )^k|-(P+D 
p 



Proof. This is similar to the proof of Lemma |2,4| , substituting the hypothesis (and the 
trivial \<pz\ < 1) into ( |2.5| ), but the estimate of the integral is slightly more complicated; 
for details see U. □ 



Lemma completes, by induction, the proof of (|2.1|) and the estimate (|2.2[) . 
The bound for c 3/ / 2 obtained above and (2.2) now yield (using Maple) first c 5/ / 2 < 
103215 and then c 7/2 < 197102280. These bounds and (fE|) further yield 

c p <2f 2+5 P, p = k + l (2.8) 



for integers k > 0; again see @ for details. To obtain (|2j) if p > 1/2, let p x := [p-±] +±. 
Then, by (U) and (U) , provided n>p + 2>pi + l, 



n (t)\ 1/p < \4>z n {t)\ 1/P1 < 2 pi+5 |t|- 1 < 2P +6 \t\-\ 



The case p < 1/2 follows similarly from (|2.6j), which completes the proof of Theorem 2.1 



□ 



Remark 2.6. A variety of other bounds are possible. For example, if we begin with 



the inequality (|2.7D and use (2J5), we can easily derive the following result: 



\4z n it)\ < 



32tt 2 
^2" 



V47T 



+ 2 < 



32vr 2 hit 
t 2 



for all t > 1.72 and n > 3. 



(2- 
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3 Convergence of densities 

It is easily checked that the random variable h ytZ (U) is absolutely continuous for every 
fixed y and z, and thus, by mixing, SZ is absolutely continuous for every Z. In other 
words, for any Zq, the random variables Z n have densities for all n > 1; cf. fl8|| . These 
densities may be unbounded and discontinuous, at least for n = 1, as is seen in the case 
Zq = 0. However, we now can show that for n > 3, at least, no such irregularities occur. 

Theorem 3.1. If n > 3 ; then Z n has a bounded continuous density function f n , for 
any Zq. More generally, if k > 0, then f n is k times continuously differentiable for all 
n > k + 3, and there exists a constant independent of Zq and n (with n > k + 3) such 
that \fi k \x)\ < C k , x e R. Explicitly, \f n (x)\ < 16 when n > 5, and \f' n {x)\ < 2466 
when n > 6. 



Proof. Theorem 2.1 shows, in particular, that as soon as n > 3, 

\4> Zn (t)\ <min(l,187|r 3/2 ), 

and thus <f>z n is integrable. This implies, as is well-known (see e.g., || Theorem XV. 3. 3]) 
that Z n has a bounded continuous density f n given by the Fourier inversion formula 

1 f°° 

fn(x) = —J e~ ltx <p Zn {t) dt, x G R. (3.1) 

Moreover, using Theorem 2A with p = k + |, we see that t k 4>z n (t) is also integrable 
when n > /c + 3, which by a standard argument shows that f n is times differentiable, 
with 

1 r 00 

4 fc) (z) = — y (-it) fc e- to ^ n (t)dt, xGR; (3.2) 



and thus 



1 f 00 

sup|/W(x)|<— / \t\ k \^ Zn (t)\dt, (3.3) 

where the latter integral can be estimated using Theorem 2.1 with p = k + |. 
The argument above yields the bound 

1 f 00 3 
|/n(a?)| < — / min (l> 18T|t|™ 3/2 ) = -187 2/3 < 31.3, n > 3. (3.4) 

27T J^oo 7T 

To obtain better numerical bounds we combine Theorem ^?l| for p = 0, 1/2, 3/2, 1, 
5/2, 7/2 and ( |2.9| ) (for t in different intervals; see Q for details); this yields, provided 
n > 5, / n (x) < i J|0 n | < 15.3; similarly, invoking also fl2.1| ) with p = 9/2, /^l^) ^ 
± f\t\\</) n (t)\ dt < 7 2465.9 for n > 6. □ 
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Theorem 3.2. Suppose that E Zq = and Var Zq < oo. T/ien £/ie density functions f n 
of Theorem 3A_ converge uniformly to the (smooth) density function f of Y at a geo- 
metric rate: 



sup|/ n (x) — f(x)\ = 0{r n ) for every fixed r > (2/3) 



1/2 



Explicitly, for any p > 1 and n > p + 1, 



A f2c p \2/(p+Vp+ 1 {2\(h-^rr> 



S up|/„(x)-/(*)|<-^j ^TiUJ ■ < 3 ' 5 ' 

where A := (Var Zg + a 2 ) 1 / 2 and c p is as in Theorem 2.1. In particular, 

/2 \ 5n/18 

sup|/„(x)-/(s)| < 2297AI-) < 2297^(0.8935)", n > 5. (3.6) 
x V 3 / 



Moreover, 



128 A /2\ (n/2)-3.7v« 

sup|/ n (x)-/(x)|< (-) , n>3. (3.7) 

x 7T \3/ 



Proof. By the Fourier inversion formula ( |3.l| ), 

|/„(x) - /(x)| < ^ / J^W " ( 3 - 8 ) 
In order to estimate the right hand side, note that for any random variables X and Y, 
\4>x{t) ~ 4>Y(t)\ < E|e itx - e ltY \ < E\tX - tY\ < \t\ \\X - Y\\ 2 ; 

since the characteristic functions here depend on the marginal distributions only, this 
and the definition ( [1.5] ) yield 

\<f>x(t) - <hr(t)\ < \t\d 2 (C(X),C(Y)). 
In particular, with d n := di(F n ,F), 

\<l>Z n (t)-<t>Y(t)\<\t\dn- (3.9) 
Further, for any p > 1 and n > p + 1, Theorem |2.1| yields the estimate 



Consequently, for any T > 0, 

»oo /-T 



/oo r r 

\4>z n (t)-4>Y(t)\dt< d n \t\dt+ 2c p \t\- p dt 
-oo J-T V|t|>T 



|t|>T 
P — 1 
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For given n and p, the optimal choice here is T := (2c p /d n ) 1 ^ p+1 \ giving the bound 

" \<j> Zn (t) - Mt)\ dt < P -±±(2c p f/^dl-W^. (3.10) 
p-1 



With ( |3.8[ ) and the estimate (|1.7| ), this yields ( |3.5| ). Choosing p = 7/2 and evaluating 
the constants numerically, using A > a > 0.648, we obtain ( |3.6| ). 

To obtain the final estimate, we use ( |2.3[) and observe that, for p > 2, 

2^ X 2/( P+ 1) ^ ^yWx2/( P+ i) = 22(p+5) ^^/( P+ i) ^ 22(p+5) ^ _ 2 



which by ( p.5| ) yields that for n > p + 2 > 4, 

^ 2 P +io/ 2 \ ( ^~FPT )n 



sup| /n (x)-/(x)|<^2 2 ^°(|) 



Choosing the optimal p := [n ln(3/2)/(21n2)] 1/2 - 1, we find (j377|) [with the constant 
(8(ln2)/ln(3/2)) 1 / 2 < 3.69812 multiplying y/n\, at least when n > 31. For 3 < n < 30, 
(|3.7| ) follows trivially from (|3.4| ), since the right hand side of (^]7|) then is larger than 
193. □ 



To test out Theorem |3.2| numerically, choose Zq = 0, so that A = a = 0.648. For 
n = 100, (3.6) yields the bound 0.0192; for n > 177, (3.7) is better, and yields for 
example 3.21 x 10~ 6 for n = 177, 2.07 x 10~ 6 for n = 180, and 1.07 x 10~ 7 for n = 200. 

Remark 3.3. Similarly, using ([3.2;), we obtain geometric uniform convergence of the 
first derivatives, and of any higher derivatives, of the density functions. 



Remark 3.4. Suppose that Zq has finite moments of all orders. Then, by Lemma 7.2 



below, ~Ei\Z n \ p is finite and stays bounded in n, for each real < p < oo. It follows that 
the characteristic functions <f)z„ are infinitely differentiable with derivatives bounded 
uniformly in n. If we apply both Theorem |24] and ( |3.9| ) to \4>z„(t) ~ 0y(*)I an d take 
the geometric mean of the resulting bounds, we find, for n > 2p + 2, 



\<f>z n {t) - M*)\ < [2c 2p+ i\t\- 2p d 2 (F n ,F) 



I 1/2 



It follows easily by induction on k, using Lemma 2.10], that in fact, for every real 
p > and integer k > 0, there is a constant c Pi k [depending on jC(Zq)} such that for all 
n > 2 k+1 p + 2 we have, with p k := (2/3) 2 ~ k ~ 2 < 1, 



sup|t| p 
ten 



Omitting details, since the Fourier transform is continuous on the Schwartz space |I 

S := {/ : sup \t\ p \f {k) {t)\ < oo for all p, k > 0}, 

t 

it follows that for each k and p, \x\ p fn \x) converges uniformly to \x\ p f( k \x) with 
geometric rate. 
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Theorem treats uniform approximation of / by /„, using the norm ||/ n — /||oo := 
sup x \f n (x) — f( x )\- We now turn to studying the error in the L 1 -norm \\f n — f\\\ : = 
f-oo\fn — /I- 

Note first that, because 2 f n = f^f = 1> 

/oo roo 
\f n (x)-f(x)\dx = (f(x)-f n (x)) + dx, 
-oo J — oo 

and that this coincides with the total variation distance 

dTv{F n ,F) := sup \P(Z n £ A) — P(Y G A)\; 

ACR 

moreover, it dominates the Kolmogorov-Smirnov distance 

d KS (F n , F) := sup \P(Z n < x) - P(Y < x)\ < d TV {F n , F). 

Theorem 3.5. Suppose that E Zq = and Var^o < °°- Then the total variation 
and Kolmogorov-Smirnov distances between F n and F converge geometrically to 0: 
dKs{F n ,F) < d-rv(F n ,F) = 0(r n ) for every fixed r > (2/3) 1 / 2 . Explicitly, for any 
n > 1, 

/2\ (n/2)-3.7 v / H 

d KS (F n ,F) < drv(F n ,F) < 135An(^-J . (3.11) 

Proof. For any a £ (0, 1), 

f'OQ f'OQ 

d T v(F n ,F)= (f(x)-f n (x)) + dx<\\f n -f\\ 1 - a f(x) a dx, (3.12) 



— oo 



where \\f n — /||oo is estimated in Theorem |3.2| . The final integral can be estimated by 
Holder's inequality: for any b > 



f(x) a dx= / f{x) a e ab \ x \ ■ e~ ab W dx 

3 J — OO 

(/•oo \ a / r 00 ^ 1— a 

/ f(x)e b ^dx) / e- ab \ x \ /{l - a) dx 
J -co J \J — oo 

ia /2(l-a)xl 



< MQ+M-b)]' 



ab 



2, M b ) + ^- b ) v 



ab y a " 2 I (1 - a) (3 ' 13) 



where := Ee Ay is the moment generating function of Y. Rosier |Tf|] proved that 
ijj(X) is finite for all A; thus ff a <oo for every a € (0, 1), and the first claim follows by 
(|3TT2; ) and Theorem 

For ( p. lip we choose b = 1/3, for which it will be shown in Theorem |4.1| below that 
V>(±6) < exp(l/9) < 1.2, and thus implies j^P < 2/(ab) = 6/a. Denoting 
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the right hand side of by B, we thus obtain from ( |3.12|) and (3.7), observing that 
B > (3/2)-™/ 2 , 

d TY (F n ,F) < -B 1 ~ a < -(3/2) an / 2 B. 
a a 

We optimize by taking a := 2/(nln(3/2)) and obtain the following bound (for n > 5, so 
that a < 1; smaller n are trivial since g?tv ^ 1) : 

, ^ „ n 384eln(3/2) , /2\ (n/2)-3.7Vn 
d TY (F n ,F) < 3en51n(3/2) = ^-^Ara(-) . □ 

7T V3/ 

Remark 3.6. If we are content with a weaker explicit bound, we can avoid invoking 
estimates of ij) by using moments of Y instead. For example, 

r f(*) i/2 dx<( r m(a 2 + x 2 ) dx) 1/2 ( r 1/2 = < 2.1 

J —00 \J—oc / \J-00 " + 3? / 



and thus 

d K s(F n ,F) < d TV (F n ,F) < 2.1||/ n - /||V 2 



4 Bounds on moment generating functions 

Letting V'z(A) := Ee A ^ denote the moment generating function of a random variable 
Z, we find in analogy with (2.4) the relation 

^sz{\)= f\z{u\)^z{{l-u)\)e X9 ^ du, A e R. (4.1) 
J o 

In particular, it follows that if ^z(A) is finite for all A, then so is V'S'z(A). 



Rosier [15] proved that the moment generating function is everywhere finite and 



that for every L > there is a constant ifx, such that 

Vv(A)<e^ A2 , |A|<L. (4.2) 
Moreover, it is implicit in the proof that 

if ipzW < e KLX2 for |A| < L, then Vsz(A) < e KL>? for |A| < L. (4.3) 



Note that ( |4.3| ) implies by induction that if we choose Zq = 0, then ^/>z n (A) < e^ LA , 
|A| < L, for every n, and thus ( |4.2|) follows by Fatou's lemma. More generally if ( |4.3[) 
holds and ^ (A) < e KiA2 , |A| < L, then by induction ipz n (X) < & KhX \ |A| < L, for 
every n. 

Rosier did not give explicit values of the constants Kl, but such values can be 
obtained from his proof as follows. [Actually, Rosier [|ll| treated the somewhat more 
complicated case of the variables Y n of (|1.1[) ; see || for explicit constants in that case. 
In our case there are some simplifications leading to better constants. Moreover, we 
introduce some deviations from Rosler's proof designed to improve our bounds.] 
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Theorem 4.1. Let Lq = 5.018 be the largest root of e L = 6L 2 . Then 
hold with 



(ID 



d Q 



K L 



1, L < 0.42, 

12, 0.42 <L<L , 

2L- 2 e L , L n < L, 



or any larger number. In particular, we can always take Kl = max(2L 2 e L , 12). 



For A < 0, we can obtain much better estimates. [For ( [4.3|) , we restrict to A < in 
both the assumption and the conclusion.] 



Theorem 4.2. We have (|4j) and for A < wii/i 



0.5, L < 0.62, 
1.25, 0.62 < L, 



or any larger number. In particular, we can always take Kl = 1.25 for A < 0. 

Proof of Theorems andgj. If if> z (\) < e K>? for |A| < L, then by (|]|), for |A| < L, 



[u 2 + (l-u) 2 ]+\g(u) du = e KX 2 Be \g(U)-2KX 2 U(l-U) _ 



Hence, ( |4.3| ) holds with = IT if (and only if) 

/k(A) := Ee^W-^ 3 ^ 1 "^ < 1, when |A| < L. 



(4.4) 



Similarly, ( |4.3; ) holds with Kl = K for A > (respectively, for A < 0) if ( fi.4j ) holds for 
< A < L (resp., for — L < A < 0). Clearly, /jf(A) decreases as K increases, and thus 
if some K satisfies (|4.4j ), then so does any larger K. 

Following Rosier, we argue differently for small and large L in order to find a K 
satisfying ( J4.4| ) . For small L we use a Taylor expansion. By straightforward differenti- 
ations, 



/jc(0) = 1, 

f' K (0) = Eg(U) = 0, 

/&(0) = V{g(U) 2 - AKU(l - U)) = \a 2 - \K, 

f%(\) = E [((<?([/) - 4KXU(l - U)) 3 - 12KU(l - U)(g(U) - 4KXU(1 - U) 

x exp (Xg(U) - 2KX 2 U{1 - U)) 



We write the last formula as f'£(X) = E[X(U,X)] and note that < U(l - U) < 1/4 
and — r\ < g(U) < 1, where 



-g{\) = 2 In 2- 1 = 0.386. 
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Consider first A > 0. By Taylor's formula, for < A < L, 
MA) < 1 + ±A 2 /£(0) + i A 3 sup /£(A) 

0<A<L 

< 1 + |A 2 (a 2 -2K + L sup jf (A)) 

0<A<L 

so (|0|) is satisfied for A > provided 



L sup f'H (A) < 2K-cj 2 . (4.5) 

0<A<L 



If c/([/) > 0, we find 

X(U,X) < (l + 3K 2 L)e L , < A < L; 
while if g(t/) < 0, we find 

X(U,\) <3K(rj + KL), < A < L. 
For K > 1, in either case, because 3r/ > 1, 

X(*7,A) < (3KT] + 3K 2 L)e L , < A < L, 

and thus 

L sup /x(A) < L(3Kr] + 3K 2 L)e L . 

0<A<L 



It is readily checked that this is less than 2K — a 2 so that (4.5) holds, for K = 1 and 
L = 0.42. 

For larger L, we begin by another crude estimate. Let W = U/2 be uniformly dis- 
tributed on (0, 1/2). Then, by |<7(£/)| < 1 and symmetry, 

fx{X) < e |A| Eexp(-2KA 2 C/(l - U)) = e |A| Eexp(-2KA 2 W(l - W)) 

< e |A| Eexp(-i^A 2 VF) = e |A| [ exp(-K\ 2 u/2) du 

Jo 

-■■9k(X). (4.6) 



JA| l-exp(-KA 2 /2) 



KX 2 /2 

Note that that gx, too, decreases if K is increased. Taking the logarithmic derivative, 
we find for A > 0, 

(ln^(A))' = 1 - ^ + ^Ae^ A2 / 2 (l -exp(-i^A 2 /2))- 1 

For A > 2, this is evidently positive, and thus gx then is increasing. Hence, if K > 
K := 2L" 2 e L , then 

9k(X) < 9k{L) < g K {L) = 1 - exp(-e L ) < 1, 2 < A < L. 
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For smaller A, we take K = 12, and check numerically that 312(0. 42) < 1. Moreover, 



K\*/2 _ 1 = e 6A2 _ ! > 6A 2 + lgA 4 



and further, ifl/3<A<l, 



(1-4) _1 < 1 + A< 1 + 3A 2 



and thus 

KX < 12A < 2/ 1 _An = 2_ l 



e KAV2_x - 6A 2 (1 + 3A 2 ) " AV 2/ A 
Hence, ( fO| ) shows that 312 is decreasing on [1/3, 1], and thus 

312(A) < 312(0.42) < 1, 0.42 <A<1. 

Finally, 

312(A) < If- < f < l, 1 < A < 2. 

Combining these estimates we find that if K > max(12, 2L~ 2 e L ), then /i*r(A) < 3k (A) < 
1 whenever 0.42 < A < L, while f K {X) < /i(A) < 1 for < A < 0.42, and thus (fOJ) 
holds when A > 0. 

We have also shown that iT = 12 will do for L < 2 and A > 0; since 2L~ 2 e L is 
increasing for L > 2, and thus less than 12 for 2 < L < Lq but larger than 12 for 
L > Lq, Theorem 4.1 for A > follows. 

For A < 0, we again use Taylor's formula for small |A|; arguing as above we see that 
( |PD holds for A < provided 

L sup (-/£'(A)) < 2K - a 2 . (4.8) 

-L<A<0 

It is easily checked numerically that maxo< u <i u(l — u)g(u) < 0.033. It follows that 

X(u, A) > {-rf - 0.396K - 3K 2 L)e vL , -L < A < 0. 

Hence, ( [1.8| ) holds and (|4.4| ) is satisfied for A < provided 

(3 3 + 0.396K + 3K 2 L)Le vL <2K -a 2 . 

It is readily checked that this holds for K = 0.5 and L = 0.62. 

For larger L we argue as follows. The function h{u) := g{u) + 4r]u(l — u) satisfies 

2 

h"( u ) = — --8r?>8-8r?>0, < u < 1. 

u(l — u) 

Hence h is convex, and since h'(l/2) = 0, 

h(u)>h(±)=0, 0<u<l. 

Consequently, if A < and K\X\ > 2rj, then 

Xg(U) - 2KX 2 U{1 -U)< Xh(U) < 

and thus f K {X) < 1. Choosing K = 2r//0.62 < 1.247, this shows f K (X) < 1 for 
A < —0.62, while /x(A) < /0.5(A) < 1 for —0.62 < A < by the preceding case. 

This completes the proof of both theorems. □ 
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If we just want a bound on tpy, 



and Theorems 4.1 and fO can be stated more 



simply as follows (ignoring the better bounds obtained for small A). 



Corollary 4.3. With Lq as in Theorem ffJ 

Vv(A) < < 



f p 1.25A 2 



„2e A 



A < 0, 

< A < Lq, 

A > Lo- 



in particular, iPy{X) < exp(max(12A 2 , 2e A )). 



□ 



The bound e 2eA is very large even for moderately large A, but the next result shows 
that iPy{X) really is of essentially this size. In particular, it follows that In In ifiy (A) ~ A 
as A — ► +oo. 

Theorem 4.4. If 7 < 2/e, then for sufficiently large X, 

ip Y (X) > exp(7A~ 1 e A ). 

Proof. Since a moment generating function is convex and ip'y(0) = EY = 0, tpy is 
increasing on [0, 00). Moreover, g is decreasing on [0, 1/2]. Hence, if < S < 1/2, the 
integrand in ( |PD with Z = Y is for < u < S at least V>y(O)-0y((l - 8)X)e Xg ^ and the 
same holds for 1 — 5 < u < 1 by symmetry. Consequently, 

r 5 

4> Y (X) > 2 / ij Y (uX)-ifj Y ((l-u)X)e Xgiu) du>2# y ((l-5)A)e A9(<5) , < J < 1/2. 

(4.9) 

Let a > 1/2 be a constant to be determined later and choose 5 := ae~ x . Then g(8) = 
1 - 0(Xe~ x ) and thus by for A > ln(2a), 

^y(A) > 2ae- 0( - x2e ~ X ^ Y (X - aAe" A ). 

If < e < 2a, there thus exists A such that for A > A, 

V>y(A) > (2a - e)V>y(A - aAe~ A ). 

Given A > A, let Ao := A and define inductively A n +i := X n — aX n e~ Xn , n > 0. 
Let N be the smallest integer with Xn < A. Then ?/>y(A n ) > (2a — e)^y(A n+ i), 
n = 0, . . . , N — 1, and thus 

^y(A) = ^y(Ao) > (2a - e) N thf{\ N ) > (2a - e)". 
It remains to estimate N from below. Since e x is increasing, 



An 



e x dx < e Xn (X n - A n+ i) = aA„ < aA 



n+l 
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and thus ^ ^ 

iVaA> [°e x dx> [ e x dx = e x -e A . 



X N J A 



Consequently, 

ln^y(A) > iVln(2a - e) > — —\~ l (e x - e A ), A > A. 



ln(2a-e) j A A 



We choose a = e/2, which maximizes ln(2a)/a. Then ln(2a)/a = 2/e, we may choose e 
so small that ln(2a — e)/a > 7, and the result follows. □ 

As is well known, bounds on the moment generating function yield bounds on the 
tails of the distribution. 

Theorem 4.5. If y > 2e Lo = YIL\ = 302.1, then 

P(Y > y) < exp(-y(lny- 1 - In 2)). 
Proof. For A > L , by Corollary (D| 

P(Y >y)< e~ Xy Ee XY < exp(2e A - yX), 
and the result follows by taking A = ln(y/2). □ 

Remark 4.6. The same estimate holds for every Z n provided, say, t/)z {X) < exp(12A 2 ); 
for example, when Zq = 0. 



Theorem 4.4 suggests that the true size of P(Y > y) is (for large y) not much smaller 



than the upper bound in Theorem 4.5. Indeed, Knessl and Szpankowski |Tl]] have found 
(assuming an as yet unverified regularity hypothesis) a much more precise formula for 
the asymptotics of P(Y > y) which is of the order exp(— y[lny + lnlny + 0(1)]) . 

For the left tail, Corollary |4.3| similarly implies P(Y < y) < exp(— y 2 /5) for y < 0, 
but this result is much weaker than the doubly exponential decay found by Knessl and 



Szpankowski [11 1. 



5 Geometric rate of convergence for moment generating 
functions 

Theorem 5.1. Suppose that Zq has mean zero and an everywhere finite moment gen- 
erating function tpz - Then V'z„(^) — ► ^y(A) at a geometric rate for every fixed A G R. 
Explicitly, if L > and Kl are such that Q4.2j) and (fO|) hold, and if moreover 



</%(A)<e^\ |A|<L, (5.1) 
then, for every n > and |A| < L/2, 

|^z n (A)-^(A)| <(VarZ + ( T 2 ) 1 / 2 |A|(^(2A)+^(2A)) 1/2 (2/3r/ 2 
< 2 1 / 2 ( Var Z + a 2 ) l ' 2 \ A| e 2 ^ x " (2/3) n / 2 . 
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Of course, if the hypotheses in the first sentence of the theorem's statement are met, 
then, given L > 0, (5.1) holds for some Kl < oo, which by Theorem 4J. may be chosen 
so large that (g]| and Q) hold, too. 



Proof. By ( [4.3[ ) and induction, the estimate Q5.1| ) holds for every ipz n - Fix n > and 
consider the optimal ^-coupling of (the laws of) Z n and Y. Then for A G [-L/2, L/2] 
we have, using the mean value theorem and the Cauchy-Schwarz inequality, 



E e XZn - E e XY 



< E 









<B(\X\\Z n -Y\e m ^ XZn ' XY) f 
< |A| (E|Z n - Y\ 2 ) 1/2 (E e 2max ( AZ - Ay ) X ] " 



< | A| (E|Z n - If) 1/2 f E e 2AZ " + E e 2XY " h ' 



By the optimality of the coupling and (fL 

(E|Z n -y| 2 ) 1/2 = d 2 (F n ,F) < (VarZ + a 2 ) 1 / 2 (2/3r/ 2 , 
and by Q and Q for 

Ee 2AZ "+Ee 2Ay <2e^( 2A ) 2 , 
whence the result follows. □ 



Note further that the operator ij)z l— * V'sz given by (|4.l| ) is monotone, in the specific 
sense that if ?/>z(A) < ?/;w(A) for |A| < L, then also ipsz{X) < ?Aw(^) fo r |A| < L. 
In particular, by induction, if tpz W — ^Zi(A) for |A| < L, then ipz n W increases 
monotonically to its limit ipy(X) for |A| < L. Likewise, if ^z W > ^(A) for |A| < L, 
then ipz n W decreases to "0y(A) f° r l-^l < 

We give two simple special cases. 

Corollary 5.2. Suppose that Zq = 0. Then ipz n (^) increases monotonically to i^y(X) 
for every fixed A. If L > and Kl are such that ( t4,2|) and (|4.3|) /io£d, then, for every 
n>0 and |A| < L/2, 

< Vy(A) - Vz n (A) < 2 1 / 2 CJ |A| (Vy(2A)) 1 / 2 (2/3)"/ 2 

< 2 1 /V|A|e 2 ^ A2 (2/3)"/ 2 . 

Proof. Since EZi = 0, by Jensen's inequality ^Zi(A) > 1 = il>z W, an d the mono- 
tonicity follows. In particular, *0z n (2A) < ■0y(2A), and since ( |5.1| ) trivially is satisfied, 
the result follows from Theorem |5.1| . □ 



Corollary 5.3. Suppose L > and Kl are such that ( f4.2j ) and Q4.3; ) /io/d, and /et 
Zo ~ N(0,2Kl)- Then ipz n (^) decreases monotonically to ^y(A) /or every fixed A wit/i 
|A| < L, and, /or every n > and |A| < L/2, 

< fe(A) - <MA) < (4tf £ + 2o- 2 ) 1 / 2 |A| (V>z n (2A)) 1/2 (2/3)^ 
< (4^ L + 2o- 2 ) 1 / 2 |A|e 2 ^ A2 (2/3)"/ 2 . 
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Proof. Since iftzoW = e KhX2 , (4.3) yields Vzi(A) < ipz W, and the monotonicity 



follows. The estimate thus follows from Theorem |5.1| . □ 

6 On numerical calculations 

The preceding results make it possible, in principle at least, to calculate the density, 
distribution, characteristic, and moment generating functions of Y numerically, with 
provable arbitrarily high accuracy. 

To begin, the results of earlier sections show that it suffices to start with a suitable 
C(Zq), for example unit mass at or a normal distribution, and then calculate the 
corresponding quantity for Z n , for a large n that can be determined. The distribution of 
Z n can be calculated recusively; for the characteristic and moment generating functions 
we have the recurrence relations (|2.4j ) and (^^) , while for the density functions we have 
the following recursion: 

Theorem 6.1. // n > is arbitrary and Zq has a bounded continuous density func- 
tion /o, or if Zq is arbitrary and n > 3, then Z n and Z n+ \ have bounded continuous 
density functions f n and f n +i satisfying the identity 

fn + i{x)=f [ Uz)fJ X ~ 9{u) ~ {l ~ u)z Y-dzdu, XGR, (6.1) 
with g(-) given by (|Q|). 

Proof. Our proof (similar to that of Theorem 4.1 in ||) is by induction on n > in the 



first case and on n > 3, using Theorem 3.1 to get started, in the second case. We may 



therefore assume as our induction hypothesis that f n is bounded and continuous. It is 
easily checked that, for each < u < 1, the inner integral 

h r \ f t i \ t ( x ~ 3( u ) - C 1 - u ) z \ 1 A 
h u (x) := / f n (z) f n - dz 

JzeR V u ) u 

is a density function for the random variable 

uZ n + (l-u)Z* + g(u), (6.2) 

and, using dominated convergence, that h u is bounded and continuous. Indeed, h u {x) < 
(sup f n )/u, and since h u = h\- u by symmetry in (p^), h u (x) < 2sup/ n , uniformly in 
u and x. It follows, by dominated convergence again, that x *—> f n +\(x) = J 1 h u (x)du 
is a bounded continuous density for Z n+ \. □ 



The integrals in ( p.4[ ), (|4.1]), or ( |6.1D have to be computed numerically — as does the 
integral of f n to get F n — but that can be done with arbitrary precision since the results 
above provide bounds for the integrands and their derivatives. [The function g{u) has 
an unbounded derivative as u — > or u — > 1, but that can be handled by truncating 
the interval.] Consequently, to calculate <t>z n {t) with given precision for a given t, it 
suffices to know 0z n _ 1 (tfc) with another given precision for a finite number of points tk, 
which can be done recursively. (However, a brute force recursion along these lines seems 
to require too many numerical integrations to be practical if we want reasonably high 
provable accuracy.) 
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Remark 6.2. To calculate the density f n numerically, it might be better to compute 
4>z n recursively by (|2.4j) and then use ( |3.1| ), instead of using the recursion (6.1) directly. 
This is both because (|6.1|) is a double integral and because we have the simple bounds 



< 1 and 



< B\Z„\ k , k > 1. 



7 The metrics d p and a lower bound on d,2(F n , F) 

At ( |1.7| ) we recalled Rosler's fundamental result 

d 2 (F n ,F)=0(p n ) 

with p := (2/3) 1 ' 2 . The question naturally arises as to whether there is a lower bound 
that matches at least to the extent that 

d 2 (F n ,F) = n(r n ) 

for some r > 0. Of course, the answer is negative without any further restrictions, since 
if Fo = F then F n = F for every n. However, our main result of this section asserts 
that this is the only exception, at least among distributions Fq with finite moments of 
all orders: 

Theorem 7.1. If Fq ^ F has finite moments of all orders, then there exists r > 
( depending on Fq ) so that 

d 2 (F n ,F) = Q(r n ). 



Our arguments for Theorem 7A will require use of metrics d p generalizing (|l.5| ). So 
we will warm up in Section 7.1 by recalling the definition of, and two useful facts about, 
dp and in Section [7^ by extending the upper bound result ( |L~7| ) to d p for p > 1. Then 



in Section 7.3 we will prove a sharpened version of Theorem |7.1| (namely, Theorem 7.7). 



7.1 The metrics d p 

For real 1 < p < oo, let \\X\\ P := (E\X\p) 1/p denote the LP -norm, and let d p denote the 
metric on the space of probability distributions with finite L p -norm defined by 

d p (F,G) := mm\\X - Y\\ p , 

taking the minimum, as at (|l.5|), over all couplings of C{X) = F and C(Y) = G. 
It is worth noting that there is a coupling [namely, X = F _1 (t r ) and Y = G _1 (C7) 
for U uniform and a suitable definition of the inverse probability transform F^ 1 ] that 
achieves the minimum simultaneously for each 1 < p < oo (assuming F and G have 
finite moments of all orders): see M. 

We begin with two elementary facts that will be useful later. The proof of the first 
fact (Lemma [F^) shows that S is a contraction for the cip-metric. 

Lemma 7.2. Consider real 1 < p < oo. The dp-distance from the limiting Quicksort 
distribution F does not increase when the operator S of Ql.4| ) is applied. Therefore, 
d p (F n ,F) is nonincreasing, and hence bounded, in n if~E\Zo\ p < oo. 
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Proof. With a slight abuse of notation, we find, for Z with any law, 



d p (SZ, Y) = d p {SZ, SY) < \\U(Z - Y) + (1 - U)(Z* - Y*)\\ p , (7.1) 

coupling (Y, Z) optimally and (Y* , Z*) optimally and choosing U, (Y, Z), and (Y*, Z*) 
to be independent. In calculating the L p -norm value on the right in (|7.l| ), condition on U 
and then use subadditivity of L p -norm together with independence to bound that value 
by \\Z — Y\\ p = d p (Z,Y). This establishes the first assertion: d p (SZ, Y) < d p (Z,Y). 

Therefore, d p (F n , F) = d p (S n Zo, Y) is nonincreasing, and hence bounded by d p (Zo, Y), 
which is bounded by ||^o||p + ll^llp < oo if E \Zq\ p < oo. □ 



Remark 7.3. Conversely, if HS'.Z'llp < oo, then ~E\uZ + (1 — u)Z*\ p < oo for some 
u S (0,1), and thus E|Z| P < oo too. Hence, if E|Zo| p = oo, then ~E\Z n \ p = oo and 
d p (Z n ,Y) = oo for all n. 



Lemma 7.4. For real 2 < p < q < oo we have, for any F and G, 



d p (F, G) < d% (q ~ 2) (F, G) x d p q iq - 2) (F, G). 



Proof. Using the common optimal coupling for c?2 and d q , this is immediate from the 
inequality 



2(g-p) g(p-2) 
I "VII ^ II "VII P(<?-2) || \^|| p(9-2) 
\JY \\ p \ ||^||2 II 11^ 



which in turn follows from the fact [|T(], Exercise 4(b) of Chapter 3] that m||X||p is 
convex in p £ (0, oo). □ 



7.2 Geometric rate of convergence in each metric d p 

Under suitable conditions, we can establish a geometric rate of convergence for d p (F n , F) 
for any real 1 < p < oo. We begin with an elementary lemma. 

Lemma 7.5. If p > 2, then for all x, y > 0, 

(x + y) p <x p + y p + c p {x p ~ l y + xy p ~ l ), 
where c p := p{p — 1)2 P ~ 2 . 
Proof. 

(x + y) p -x p -y p = f V p((x + t) p ^ -t p - 1 )dt= [ V f X p{p-l){t + u) p ~ 2 dudt 
Jo Jt=o Ju=0 

< p(p - l)xy(x + y) p ' 2 < p(p - l)xy2 p - 2 (x p ~ 2 + y p ~ 2 ). 

□ 
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Theorem 7.6. Let po = 6.557 be the largest positive solution to 



^Po 



1/po 



1/2 



(7.2) 



and let, for any e > 0, 



A, 



:i) 1/2 

'2\V2 



1/p 



1 < p < Po, 
P = Po, 
P > Po- 



(7.3) 



TTizts, /or p > 2 except p = po, f3 p = max((2/3) 1/ ' 2 , (2/(p + l)) 1 / p ). T/ien, /or any Zq 
with zero mean and finite variance, and every p > 1 such that E\Zq\ p < oo, there exists 
a constant a p < oo [depending on C(Zq)J such that 



d p (Z n ,Y)<a p p;. 



(7.4) 



Proof. First we note that (JT72|) can be written (3/2) p °/ 2 = (p + l)/2. One root of this 
equation is 2, and since (3/2) P//2 is convex, with derivative less than 1/2 at p = 2, it 
follows that the equation has two positive roots, 2 and po > 2, and that (2/3) p / 2 > 
2/(p + 1) for 2 < p < p , while (2/3) p / 2 < 2/(p + 1) for p > p . 
Next we note that ([7.4^ holds for p < 2, with 



a p := (VarZ + cr z ) 



2x1/2 



P< 2, 



by ( |1.7| ) and the inequality d p < d2, p < 2. We then proceed by induction on [pj • For the 
induction step, suppose that p > 2 and that Zg has zero mean and satisfies ||Zq|L < °°- 
By the induction hypothesis, there exist constants < a q < oo, 1 < q < p — 1, such 
that 



dg(Z„, y) < ctqffi for all g < p - 1 and n > 0. 



(7.5) 



Using our usual coupling of Z n and Y in terms of the optimal coupling of Z n —\ and Y , 
we find easily by Lemma |7lj| , for n > 1 [with (Z n -i, Y), (Z*_ 1 ,Y*), and £/ independent], 



<ff(z„,y) <e|c/(z, 



n-l 



y) + (i-c/)(z*_ 1 -y*)|^ 

< E(Z7|Z n _! - Y| + (1 - t/)|Z*_j - Y*\) p 

< E(U p \Z n ^ - Y\ p ) + E((l - Uf\Z* n _ x - Y*\ p ) 

+ Cp E{u p -\\ - u)\z n _ x - rrVn-i - y*|) 



+ c p E(C/(l - Uy- l \Z n _ x - Y\\Z* n _ x - Y 



-d p (Z n ^,Y) + 



2c v 



P+l p 

So by induction on n it follows that 
2 



P(P + 1) 



n-l 



<(z n ,y) < 



p + i 



d? p (Z ,Y) + ^Y,K i 
^ » z — ' Vp + 1 
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By the induction hypothesis ( |7.5| ) this yields, for some a\,a2 < oo (depending on p), 

„ n— 1 r. _. 

^n,Y)<a 1 (—) n + a 2 '£(— ) n ~\^ p -\y. (7.6) 



Let 7 := 0i0- We break our treatment into three cases: 



(i) If 7 > 2/(p + 1), we write the sum in (7.6) as 



and thus (^) shows that (fT-4 ) holds with j3 p = 7. 
(ii) If 7 < 2/(p + 1), we write the sum in (7.6) as 



(,4t)"?(^)'<( 1 -^ 



p + i 



i=0 

and thus (0) holds with = 2/(p + 1). 

(hi) If 7 = 2/(p + 1), the sum in flT^D equals n(2/(p + l)) n . Consequently, (|7l|) holds 
with any (3 P > (2/(p + 1)) 1/p . 



It remains to verify that this yields the j3 p given in Q7.3| ), 

First, if 2 < p < po, then the induction hypothesis yields 7 = PiP p Z\ = (2/3) p / 2 > 
2/(p+l), so case (i) gives /3fJ = (2/3) p / 2 . Similarly, for p = p , 7 = (2/3) p / 2 = 2/(p + l) 
and (iii) shows that any p > (2/(p + l)) 1/p = (2/3) 1 / 2 will do. 

For po < p < po + 1, we have 7 = @i0t~Z\ = (2/3) p / 2 < 2/(p + 1), so case (ii) yields 

/?p = 2/(p + 1). The same applies for p = po + 1, since again (2/3) p / 2 < 2/{p + 1) and 

we thus may choose e so small that 7 = (f) 1 ^ 2 ((f) 1 ^ 2 + e) P ~ < 2/(p + 1). 
Finally, for p > po + 1, we have 



7 = th/$Zl 



2\ 1/2 2 2 

< 



3/ p p + 1 



since p/(p+l) is increasing and equals (2/3) 1//2 when p = (^3/2 — 1) 1 = 2(^/6 — 2) 1 = 
V6 + 2 < 5 < po- Hence case (ii) applies. □ 



7.3 Lower bounds 



The main goal of this subsection is to establish Theorem 7.1, or rather the sharper 
Theorem |7.7j below. Since (as noted in Section |4|) the limiting Quicksort distribution F 
has everywhere finite moment generating function, it is uniquely determined by its 
moments. Hence ii Fq ^ F has finite moments of all orders, then E Zq /EP for some 
integer j > 1. 
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Theorem 7.7. Suppose Fq ^ F has finite moments of all orders, and let p be the 
smallest positive integer such that EZq ^ EY P . Then, for any < r < (^qr[) P ^ 2 , 

d 2 (F n ,F) = fl(r n ). 

(The implicit multiplicative constant depends on both Fq and r.) 

Remark 7.8. The cases p = 1 and p = 2 are a bit special, and in these cases we 
claim that Theorem 7.7 holds even with r = (^i) P ^ 2 , i.e., with r = 1 and r = 2/3, 
respectively, and without the assumption that Fq has finite moments. 

We may and shall assume that E Zq < 00, since otherwise d 2 (F n , F) = 00. 

First, p = 1 when E Zq ^ EY = 0; in this case Z n converges in distribution to 
Y + E Zq and not to Y, and thus inf d 2 (Z n , Y) > 0, i.e., the theorem holds with r = 1. 
Indeed, we have the sharper result that 

d 2 (Z n , Y)=d 2 (Y + EZ ,Y) + 0(d 2 (Z n , Y + E Zq)) = |E Z \ + 0((2/3)™/ 2 ). 

Next, p = 2 when EZo = but Var Y 7^ Var Zq; in this case Theorem |7.9| shows 
that the result holds with r = 2/3. Even in this case we have a gap between the lower 
bound n((2/3) n ) and Rosler's upper bound <3((2/3) n / 2 ); it is an open problem to find 
the rate of approximation more precisely. 



We prove Theorem |7.7| using the following Theorem |7.9| , which is a similar lower 



bound for the <i p -metric. 



Theorem 7.9. Let p > 1 be an integer, and suppose that EZq = EY J for integers 
1 < j < p— 1> and that EZq exists and is finite but fails to equal E Y p . Then 

d p (F n ,F) = n((^ I ) n 



Theorem 7.9 is, in turn, a simple consequence of the following two elementary lem- 
mas. 



The first lemma demonstrates a sense in which the value of p in Theorems [7.7| and \L9 
persists from Fq to each F n ; the second gives a general lower bound on d p in terms of 
discrepancy in pih moments. 

Lemma 7.10. Let p > 1 be an integer, and suppose for n = that E Z 3 n = EY 1 for 
integers 1 < j <p — l, and that EZ^ exists and is finite but fails to equal EF. Then 
for every n > the same is true and, moreover, 

E^-EF=(^) n (EZ p -En. 



Proof. If E|Z| m < 00, then, with Z, Z* = Z, and U independent, by (1.4) and a trino- 
mial expansion we have 



E(SZ) m = V , ,, ™" —E(U j Zi(l-U) k (Z*) k g(U) m ~ j - k ^ 

^—^ ji fe! (m — j — k)\ 

= Y iTT7 m! r r T E(C/- 7 '(l-Z7) fc 5 (C/) m - J - fe )EZ ; ''EZ fc . 

^ j\ k\ (m - j - k)\ v ' ^ ' ' 

j+k<m 
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We apply this with m = 1, . . . ,p for both Z = Z n —\ and Z = Y, and note that by 
induction on n, all terms in the sum with j < p — 1 and k < p — 1 coincide for the two 
choices of Z. Hence, EZ™ = EY m for 1 < m < p — 1 and 



EZ P - EY P = (EU P + E(l - Uf) (E^_! - EY P ) = — ^-(EZ^ - Ey p ), 
and the result follows. 



□ 



Lemma 7.11. Lei p > 1 be an integer. Then, for any F and G, 

\EX P - EY P \ 
dJF,G) > !=± , 

l^j=0 lip II 2 Hp 

with X ~ F andY ~G (and 0° := 1 ). 

Proof. Let (X, Y) be an optimal coupling of F and G. If p = 1, then 
dx{F,G) = ||X-y||i = E\X-Y\> \EX -EY\, 

as desired. If p > 2, we employ the factorization 

P-i 

XP - YP = (X - Y) XP- 1 -^, 

j=0 



whence 



\EX p -EY p \ < E\X P -Y P \ 



< 



\X-Y\ 



p-i 
i=o 



p/(p-i) 



p-i 



< d p (F,G)^||x p - 1 --'y J '| 



p/(p-i) ' 



(7.7) 



where at the second inequality we have employed Holder's inequality and at the third we 
have invoked the optimality of the coupling. Another application of Holder's inequality, 

P" 1 a „A P^I 



this time with conjugate exponents and j~, yields 



|jjfp-i-jyj| 



p/(p- 



-n < ||X|| p_1 ^' HYIP 

-1) — IK 1 lip II M Hp 



(7.8) 



for 1 < j < p — 2, and (|7.8j) is trivially an equality when j = or j = p — 1. Combin- 



ing ( \l7q ) and ( |7.8[) and rearranging, we obtain the desired result. 



□ 



Proof of Theorem WJa . By Lemma 7.2, we have the bound 

\\Z n \\p < d p (Z n , Y) + [|y|| p < d p (Z , Y) + \\Y\\ P < ||Z ||p + 2||y|| p . 
Thus Lemmas 7.11 and |7.10| yield the explicit bound 

2 



d P (Z n ,Y) > 



|EZ P - EY p \ 



v^p-l 



\Z 



oWp 



+ 2\\Y\\ p ) p ~ 1 - j \\y\\ p \p + i 



□ 
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Proof of Theorem \7. % The cases p = 1 and p = 2 follow immediately from Theorem |7.9| ; 
see also Remark 



When p > 3, fix q > p. By Lemmas 7.4 and |7.2| (the latter applied to d q ), for some 



C q we have 



d p (F n ,F) < C q d^(F n ,F) 



p(q-2) 

and thus Theorem |7.9| implies Theorem [7. 7| with r = (p^j) 2(9 ~ p) . By taking g sufficiently 
large, we obtain the result for any r < (— |^-) P//2 - □ 



We have assumed in Theorems |7.1| and 7.7 that Fq has finite moments of all orders. 
What happens if this fails? If E|Zo| p = oo for some p > 2, then d p (Z n ,Y) = oo for 
all n, but what can be said about d2(Z n ,Y)7 It seems reasonable to conjecture that 
we have at least as large ^-distance in this case as in the nicer case with all moments 



finite, and that Theorems |7.1| and |7.7| hold for all Fq ^ F. Unfortunately, we have not 
been able to prove this, but we offer the following partial result. 

Theorem 7.12. If d2(F n ,F) = 0(r n ) for every r > 0, then Fq = F. Consequently, if 
Fq 7^ F, there exists r > such that d2(F n ,F) > r n for infinitely many values of n. 

Proof. We may assume that EZq = and EZq = EY 2 (in particular, EZq < oo), 
because otherwise d2(F n ,F) = r2(r n ) with r = 2/3: see Remark |7.8| . By induction then 
EZ n = and EZ 2 = EY 2 < 1 for every n: see Lemma [7.10 . 

As usual, let Z, Z* = Z, and U be independent. If \Z\ > 2x, \Z*\ < 2, and § < U < 1, 
where x > 5, then 

\SZ\ = \UZ + (1 - U)Z* + g{U)\ > \\Z\ - \\Z*\ - 1 > \x - | > x. 

Thus, 

P{\SZ\ >x)> P(\Z\ > 2x) ■ P(|Z| < 2) • |, x > 5. 

If further EZ 2 < 1, and thus by Chebyshev's inequality P(\Z\ < 2) = 1 - P(\Z\ > 2) > 
1 — 2 = 1, this yields 

P{\SZ\ > x) > \P(\Z\ > 2x), x>5. 



Hence, by induction on n and our assumption on the first two moments of Zq, for 
any x > 5, 

P(\Z n \ > x) > A~ n P{\Z \ > 2 n x), n>0, 

and in particular 

P(\Z n \ > 2 n ) > 4- n P(|Z | > 4 n ), n > 3. (7.9) 

Now suppose that d2(Z n , Y) = 0(r n ). Using an optimal coupling between Z n and 
Y, and the fact that Y has moments of all orders, we find 

P(\Z n \ > 2 n ) < P(\Z n -Y\> 2 11 - 1 ) + P(|Y| > 2 n ~ l ) 

< 2 2 ~ 2n d 2 2 (Z n , Y) + P(|Y| > 2™" 1 ) = 0(2" 2n r 2n ). 
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Combining this with ( \l.9\ ), we obtain, for n > 3, 

P(|Z | > 4 n ) < 4 n P(\Z n \ > 2 n ) = 0(r 2n ), 

which implies that E|Zo| p < oo for every p > such that 4 p r 2 < 1. 

Consequently, if d2(Z n ,Y) = 0(r n ) for every r > 0, then E|Zo| p < oo for every 
p > 0, and Theorem 17.71 applies to yield Fq = F. □ 



Remark 7.13. Our proof of Theorem |7.12 , combined with the proof of Theorem 7.7 



shows that if Fq ^ F and p (assumed > 3 here) is the smallest positive integer such 
that either E Zq does not exist or E Zq ^ E Y p , then d2(Z n , Y) > r n for infinitely many 
values of n for any < r < r p , with r p := 2~ q , where q is the unique solution in (p, oo) 

to 2 f^- 2 ) = 2+1. 

8 Other lower bounds 

In Section [?] we showed that convergence of the iterates F n to F in the c^-metric is not 
faster than geometric. In this final section we show likewise that the convergence is not 
faster than geometric in the other metrics we have considered in this paper. We again 
assume that Fq ^ F has finite moments of all orders. (Without this hypothesis, we can 
prove partial results by the method used in the proof of Theorem 7.12| , but we do not 
know whether the full results hold.) 

8.1 Kolmogorov— Smirnov and total variation distances 

We begin with a simple lemma. 

Lemma 8.1. Let p > 0. For any X ~ F and Y ~ G each with finite pth. absolute 
moment, if K := dKs(F, G), then, for any < M < oo, 

\E(X P ;X > 0) -E(Y P ;Y > 0)| < KM P + E (X p ; X > M)+B(Y P ;Y > M) 

and, if p is an integer, 

\EX P - BY P \ < 2KM P + E (\X\ P ; \X\ > M) + E (\Y\ P ; \Y\ > M) . 

Proof. Define Xm '■= min(X + ,M), where X + = max(X, 0), and similarly Ym- Then 

< E(X P ; X > 0) - EX P M < E(X P ; X > M) 

and similarly for Y, while 



\ EX li ~ ey m\ 



rM rM 

I px p ' l V{X > x) dx - I px p ~ 1 V(Y > x) dx 
Jo Jo 

rM 

/ px p ~ l \P(X > x) - P(y > x)\ dx < KM p . 
Jo 



< 



Together, these yield the first inequality. 

The second follows by applying the first to (X, Y) and to (—X, —Y) and summing 
or subtracting, depending on the parity of p. □ 
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Theorem 8.2. Suppose Fq ^ F has finite moments of all orders, and let p be defined 
as in Theorem \7. % Then, for any < r < 2/(p + 1), 



dTv(F n ,F) > d KS (F n ,F) = fi(r n ). 
(The implicit multiplicative constant depends on both Fq and the choice of r.) 



Proof. Let K n := dKs{F n , F) > 0. If we apply Lemma 8.1 and then use Lemma 7.10| , 
we find, for any q > p, 

|E Zl - EYP | (^) n < 2K n M p + E {\Z n f- \Z n \ > M) + E (\Y\ P ; \Y\ > M) 

< 2K n M p + M-( q - p lE\Z n \ q + M- (q - p) E\Y\ q , (8.1) 



for any < M < oo. It follows from Lemma 7.2 that E\Z n \ q < C q , for some C q not 



depending on n. Choosing M = K n l,q thus gives, with c := |EZq — Ey p | > 0, 

and thus K n = f2(r ra ) with r = (^py) 9 ^ 9 P ^ ■ The result follows, since r — > 2/(p + 1) as 
<7 — > oo. □ 

8.2 Density functions and characteristic functions 

We immediately obtain results for the density functions f n , which by Theorem [O] exist 
at least for n > 3 and by Theorem |3.2| converge uniformly, at a geometric rate, to the 
density / of Y. 

Corollary 8.3. Suppose Fq ^ F has finite moments of all orders, and let p be defined 
as in Theorem \7. % Then, for any 0<r<2/(p + l), 



\f n (x)-f(x)\dx = n(r n ) (8.2) 

J — oo 

and 

sup !/„(*) -/(x)|=fi(r n ). (8.3) 



Proof. The estimate flS.2| ) follows from Theorem ^2, because (whenever f n exists) 
IZo l/»0*0 - f(x)\ dx = 2d TY (F n , F). 



The estimate ( [3.3; ) follows from Theorem ^2 using inequality ( |3.12 ) and the discus- 



sion following it. □ 

Similarly, we have a geometric lower bound for the L l (R) and L°°(R) distances of 
the characteristic functions. 
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Corollary 8.4. Suppose Fq ^ F has finite moments of all orders, and let p be defined 
as in Theorem \7. % Then, for any 0<r<2/(p + l) ; 



\<Pz n (t)-<Py(t)\dt = n(r n ). (8.4) 

J — oo 

and 

sup|<^ n (i)-«Mi)|=n(r n ). (8-5) 
t 

Proof. The estimate (|8,4[) is immediate from Corollary and inequality (|3.8|), Next, 



.4) for some r = ro and Theorem 2.1 imply (8.5) for any r < ro by the argument used 



to show ( |3.10| ) in the proof of Theorem ^2. □ 



It is not too hard to extend Corollaries |8^ and to any L g (R) distance, 1 < q < oo. 



8.3 Moment generating functions 

Finally, we consider lower bounds for the convergence of moment generating functions. 
We assume for simplicity that Zq has an everywhere finite moment generating function, 
and know by Theorem |5.1| that then ij)z n converges to tpy pointwise, and uniformly on 
compact sets, with geometric rate. For lower bounds we first note that if Fq ^ F and 



p is as in Theorem 7.7, then the derivatives of ipz n and ipy at the origin, which equal 



the corresponding moments of Z n and Y, by Lemma 7.10 coincide up to order p — 1, 
while the pth derivatives differ by c(^py) with c = EZq — ~EY P ^ 0. For A close to the 
origin, this and a Taylor expansion shows that \ipz n (ty — Vv(A)| = Q[\X\ p [^Y) n ) , but 
the range of A where we can prove that this is valid depends on n. 

Indeed, there is no general lower bound for \ipz„(ty — iPyW\ for a fixed A, since 
there may be points A where tpz n W and ipy(X) coincide "accidentally". For example, 
suppose that Zq is bounded with E Zq = and VarZo > VarF. By induction, the 



same holds for each Z n : see Lemma 7.1C and note that g(U) is bounded. Consequently, 
for each n, Taylor's formula shows that ipz n W > iPyW for small positive A, while 
i'ZnW = expfO(A)) and thus Theorem ^4| shows that ipz„W < V'y(^) f° r large A. 
Hence there exists for every n at least one positive A = X n such that ipz„W = ipyW- 
Nevertheless, such points have to be isolated, and if we consider the maximum deviation 
over an interval, we have a geometric lower bound. 

Theorem 8.5. Suppose Fq ^ F has everywhere finite moment generating function, and 
let (a,b) be a nonempty interval. Then there exists r > such that sup a< ^ <b \ipz n (X) — 
^y(A)| = n(r n ). 

Proof. We use the fact that the moment generating functions ipz n and ipy are entire 
analytic functions in the complex plane C. 

Let R := \a\ + \b\ + 1. There exists a (unique) function uj which is continuous on 
Dr := {z £ C : \z\ < R,} and analytic in Q := {z : \z\ < R,} \ [a, b] such that to(z) = 
for \z\ = R and u){z) = 1 for z £ [a,b]; this function is called harmonic measure and 
is probabilistically given by the probability that a Brownian motion starting at z hits 
[a, b] before it hits {z : \z\ = R}. 
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Let f n (z) := Tpz„(z) ~ tpy(z) and u n (z) := ln|/ n (z)| > -oo. For z E D R , 

\fn(z)\ < \ip Zn (z)\ + < fl> Zn (R) + MR) + iI>z n (-R) + M-R), 

which by Theorem ^4] is bounded by some constant A < oo (depending on Zq but not 
on n). Let further 5 n := max a <A<b |/ n (A)|; we may of course restrict attention to those 
values of n satisfying 5 n < 1. Now u n {z) < In ^4 for \z\ = R and u n {z) < ln<5 n for 
z E [a, b]; thus (since A > 1) 

u n {z) <\nA + i\n8 n )uj{z) (8.6) 
for every z E d£l. Since u n is subharmonic and the right hand side is harmonic in Q 



and continuous on its closure, (^j) holds for every zGO = Dr, cf. [16, Theorems 17.3 
and 17.4]. In particular, setting e := infu|<io;(^) > 0, we have 

u n (z) < In A + elnS n , \z\ < 1, 

or 

\f n (z)\<A6 £ n , \z\<l. (8.7) 
Let p be as in Theorem 7.7. By fl8.7| ) and Cauchy's estimates [|l(], Theorem 10.26], 

|/W(0)|<p!A<%. 



Since by Lemma 7.10 



|/W(0)| = |E^-E^| = O((^D, 
it follows that S n = Q(r n ) with r = {-^ l ) 1/e '. □ 

References 

[1] Cambanis, S., Simons, G., and Stout, W. Inequalities for ~Eik(X,Y) when the 
marginals are fixed. Z. Wahrscheinlichkeitstheorie verw. Gebiete 36 (1976), 285- 
294. 

[2] van der Corput, J. G. Zahlentheoretische Abschatzungen. Math. Ann. 84 (1921), 
53-79. 

[3] Dongarra, J. and Sullivan, F. Guest editors' introduction: the top 10 algorithms. 
Computing in Science & Engineering 2 (2000). 

[4] Eddy, W. F. and Schervish, M. J. How many comparisons does Quicksort use? 
J. Algorithms 19 (1995), 402-431. 

[5] Feller, W. An Introduction to Probability Theory and its Applications. Vol. II. 
Second edition. Wiley, New York, 1971. 



29 



[6] Fill, J. A. and Janson, S. Smoothness and decay properties of the limiting Quick- 
sort density function. In D. Gardy and A. Mokkadem, editors, Mathematics and 
Computer Science: Algorithms, Trees, Combinatorics and Probabilities, Trends in 
Mathematics, pages 53-64. Birkhauser Verlag, 2000. Refereed article, available from 
http : //www .mts . jhu. edu/~f ill/ or littp : //www . math . uu . se/~svante/ . 

[7] Fill, J. A. and Janson, S. A characterization of the set of fixed points of the Quick- 
sort transformation. Electron. Comm. Probab. 5 (2000), 77-84 (electronic). 

Fill, J. A. and Janson, S. Quicksort asymptotics. In preparation. 

Hoare, C. A. R. Quicksort. Comput. J. 5 (1962), 10-15. 

JaJa, J. A perspective on quicksort. Computing in Science & Engineering 2 (2000). 

Knessl, C. and Szpankowski, W. Quicksort algorithm again revisited. Discrete 
Math. Theor. Comput. Sci. 3 (1999), 43-64. 

Knuth, D. E. The Art of Computer Programming. Volume 3. Sorting and searching. 
Second edition. Addison- Wesley, Reading, Mass., 1998. 

Montgomery, H. L. Ten Lectures on the Interface Between Analytic Number Theory 
and Harmonic Analysis. CBMS Reg. Conf. Ser. Math. 84, AMS, Providence, R.I., 
1994. 

Regnier, M. A limiting distribution for quicksort. RAIRO Inform. Theor. Appl. 23 
(1989), 335-343. 

Rosier, U. A limit theorem for 'Quicksort'. RAIRO Inform. Theor. Appl. 25 (1991), 
85-100. 

Rudin, W. Real and Complex Analysis. Second edition. McGraw-Hill, New York, 
1974. 

Schwartz, L. Theorie des Distributions. Second edition. Hermann, Paris, 1966. 

Tan, K. H. and Hadjicostas, P. Some properties of a limiting distribution in Quick- 
sort. Statist. Probab. Lett. 25 (1995), 87-94. 



